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Abstract 

In time-varying wireless networks, the states of the communication channels are subject to ran- 
dom variations, and hence need to be estimated for efficient rate adaptation and scheduling. The 
estimation mechanism possesses inaccuracies that need to be tackled in a probabilistic framework. In 
this work, we study scheduling with rate adaptation in single-hop queueing networks under two levels 
of channel uncertainty: when the channel estimates are inaccurate but complete knowledge of the 
CNl ' channel/ estimator joint statistics is available at the scheduler; and when the knowledge of the joint 

statistics is incomplete. In the former case, we characterize the network stability region and show 
that a maximum-weight type scheduling policy is throughput-optimal. In the latter case, we propose 
a joint channel statistics learning - scheduling policy. With an associated trade-off in average packet 
(f) ■ delay and convergence time, the proposed policy has a stability region arbitrarily close to the stability 

region of the network under full knowledge of channel/estimator joint statistics. 
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1 Introduction 



Scheduling in wireless networks is a critical component of resource allocation that aims to maximize the 
\ overall network utility subject to link interference and queue stability constraints. Since the seminal 
paper by Tassiulas and Ephremides ([I]), maximum- weight type algorithms have been intensely studied 
(e.g., [2]-[E]) and found to be throughput-optimal in various network settings. The majority of existing 
qq , works employing maximum-weight type schedulers are based on the assumption that full knowledge of 
t—^ ' channel state information (CSI) is available at the scheduler. In realistic scenarios, however, due to 
OO ■ random variations in the channel, full CSI is rarely, if ever, available at the scheduler. The dynamics of 
, the scheduling problem with imperfect CSI is, therefore, vastly different from the problem with full CSI in 
\ the following two ways (1) a non-trivial amount of network resource, that could otherwise be used for data 
^ ' transmission, is spent in learning the channel; (2) the acquired information on the channel is potentially 
inaccurate, essentially underscoring the need for intelligent rate adaptation and user scheduling. Realistic 
networks are thus characterized by a convolved interplay between channel estimation, rate adaptation, 
' and multiuser scheduling mechanisms. 

These complicated dynamics are studied under various network settings in recent works f[9]-|15j). 
In [9], the authors study scheduling in single- hop wireless networks with Markov- modeled binary 0N- 
OFF channels. Here scheduling decisions are made based on cost-free estimates of the channel obtained 
once every few slots. The authors show that a maximum-weight type scheduling policy, that takes into 
account the probabilistic inaccuracy in the channel estimates and the memory in the Markovian channel, 
is throughput-optimal. In |l(Jj . the authors study decentralized scheduling under partial CSI in multi- 
hop wireless networks with Markov-modeled channels. Here, each user knows its channel perfectly and 
has access to delayed CSI of other users' channels. The authors characterize the stability region of the 
network and show that a maximum-weight type threshold policy, implemented in a decentralized fashion 
at each user, is throughput optimal. 

In [12] . the authors study scheduling under imperfect CSI in single-hop networks with independent 
and identically distributed (i.i.d.) channels. They consider a two-stage decision setup: in the first stage, 
the scheduler decides whether to estimate the channel with a corresponding energy cost; in the second 
stage, scheduling with rate adaptation is performed based on the outcome of the first stage. Under 
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this setup, the authors propose a maximum-weight type scheduling policy that minimizes the energy 
consumption subject to queue stability. 

While studying scheduling under imperfect CSI is a first step in the right direction, these works 
assume that complete knowledge of the channel/estimator joint statistics, which is crucial for the success 
of opportunistic scheduling, is readily available at the scheduler. This is another simplifying assumption 
that need not always hold in reality. Taking note of this, we study scheduling in single-hop networks 
under imperfect CSI, and when the knowledge of the channel/estimator joint statistics is incomplete at 
the scheduler. We propose a joint statistics learning-scheduling policy that allocates a fraction of the 
time slots (the exploration slots) to continuously learn the channel/estimator statistics, which in turn is 
used for scheduling and rate adaptation during data transmission slots. Note that our setup is similar to 
the setup considered in [TS]. Here the author considers a two-stage decision setup. When applied to the 
scheduling problem, this work can be interpreted as follows. One of K estimators is chosen to estimate 
the channel in the first stage, with unknown channel/estimator joint statistics. The second stage decision 
is made to minimize a known function of the estimate obtained in the first stage. Our problem is different 
from this setup in that the channel/estimator joint statistics is important to optimize the second stage 
decision in our problem - i.e., scheduling with rate adaptation. This is not the case in [15] where a known 
function of the estimate is optimized and the channel/estimator joint statistics is helpful only in the first 
stage that decides one of K estimators. Our contribution is two-fold: 

• When complete knowledge of the channel/estimator joint statistics is available at the scheduler, we 
characterize the network stability region and show that a simple maximum-weight type scheduling 
policy is throughput-optimal. It is worth contrasting this result with those in [9]-[I2]- In these 
works, imperfection of CSI is assumed to be caused by specific factors like delayed channel feedback, 
infrequent channel measurement, etc, whereas, in our model, since the channel/estimator joint 
statistics is unconstrained, the CSI inaccuracy is captured in a more general probabilistic framework. 

• Using the preceding system level results as a benchmark, we study scheduling under incomplete 
knowledge of the channel/estimator joint statistics. We propose a scheduling policy with an in-built 
statistics learning mechanism and show that, with a corresponding trade-off in the average packet 
delay before convergence, the stability region of the proposed policy can be pushed arbitrarily close 
to the network stability region under full knowledge of channel/estimator statistics. 

The paper is organized as follows. Section II formalizes the system model. In Section III, we charac- 
terize the stability region of the network and propose a throughput-optimal scheduling policy. In Section 
IV, we study joint statistics learning-scheduling and rate adaptation when the scheduler has incomplete 
knowledge of channel/estimator statistics. Concluding remarks are provided in Section V. 

2 System Model 

We consider a wireless downlink communication scenario with one base station and N mobile users. 
Data packets to be transmitted from the base station to the users are stored in N separate queues at 
the base station. Time is slotted with the slots of all the users synchronized. The channel between the 
base station and each user is i.i.d. across time slots and independent across users. We do not assign any 
specific distribution to the channels throughout this work. The channel state of a user in a slot denotes 
the number of packets that can be successfully transmitted without outage to that user, in that slot. 
Transmission at a rate below the channel state always succeeds, while transmission at a rate above the 
channel state always fails. We assume the channel state lies in a finite discrete state space S. Let Ci[t] 
be the random variable denoting the channel state of user i in slot t. The channel state of the network 
in slot t is denoted by the vector C[t] = \C\ [t] , C2 [t] , • • • , Cjv[t]] G S N . In each slot, the scheduler has 
access to estimates of the channel states, i.e., C[t] = [C\[t], C2[t), ■ ■ • ,CV[i]] G S N . The estimator is 
fixed for each user and the estimates are independent across users. The channel/estimator joint statistics 
for user i is given by the |5| 2 probabilities P(Ci=Ci,Ci=di), VcjGtS, 

We adopt the one-hop interference model, where, in each slot, only one user is scheduled for data 
transmission. The scheduler (base station), based on the channel estimate and the queue length infor- 
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mation, decides which user to schedule and performs rate adaptation in order to maximize the overall 
network stability region. Let I[t) and R[t] denote the index of the user scheduled to transmit and the 
corresponding rate of transmission, respectively, at slot t. Due to potential mismatch between the channel 
estimates and the actual channels, it is possible that the allocated rate is larger than the actual channel 
rate, thus leading to outage. In this case, the packet is retained at the head of the queue and a retrans- 
mission will be attempted later. Let Qi[t] denote the state (length) of queue i at the beginning of slot 
t. Let Ai[t] denote the number of exogenous packet arrivals at queue i at the beginning of slot t with 
i?L4j[i]] = Aj. The queue state evolution can now be written as a discrete stochastic process: 

Qi[t+1] = [Qi[i\-l(I[t]=i)R[t] • l(R[t]<Ci[t])] + +Ai[t], (1) 

where [■] + = max{0, •}. We adopt the following definition of queue stability [2J: Queue i is stable if there 
exists a limiting stationary distribution Fi such that lim^oo P(Qi[t] < q) = Fi(q). 

3 Full Knowledge of Channel/Estimator Joint Statistics 

In this section, we consider the scenario where the scheduler has full access to the channel/estimator 
joint statistics, i.e., P(Ci=C{, Cj=Cj), Vc^GiS, qG5 for . . ., N}. We characterize the network stability 
region next. 

3.1 Network Stability Region 

Consider the class of stationary scheduling policies G that base their decision on the current queue length 
information [Qi, . . . , Qn], the channel estimates [Cj, . . . , CV], and full knowledge of channel/estimator 
joint statistics. Define the network stability region as the closure of the arrival rates that can be supported 
by the policies in G without leading to system instability. Let Pq(c = [c±, . . . , cjv]) denote the probability 
of the channel estimate vector. Thus, 

N 

P e (c=[c 1 ,...,c N ]) = l[P(C i = c i ), (2) 

i=l 

where the probabilities P(C{ = q) are evaluated from the knowledge of the channel/estimator joint 
statistics. Defining CH[A] as the convex hull ([16]) of set A and lj as the i th coordinate vector, we record 
our result on the network stability region below. 

Proposition 1. The stability region of the network is given by 

A= ]T P e (c)-CH[o,P(C i ><(c i )|C i i =c l )r*(ci)-i f ,i = !,-■■ ,N 

where r*(di) = argmax rg5 {P(Ci > r d = &i ) -r} and the conditional probabilities P(Ci > r*(cj)|Cj = q) 
are evaluated from the knowledge of the channel/ estimator joint statistics. 

Proof Outline: The proof contains two parts. We first show that any rate vector A strictly within A is 
stably supportable by some randomized stationary policy. In the second part, we establish that any arrival 
rate A outside A is not supportable by any policy. We show this by first identifying a hyperplane that 
separates A and A using the strict separation theorem ([IT]). We then define an appropriate Lyapunov 
function and show that, for any scheduling policy, there exists a positive drift, thus rendering the queues 
unstable ([IB]). Details of the proof are available in Appendix A. 

3.2 Optimal Scheduling and Rate Allocation 

In this section, we propose a maximum-weight type scheduling policy with rate adaptation and show that 
it is throughput-optimal, i.e., it can support any arrival rate that can be supported by any other policy 
in G. The policy is introduced next. 
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Scheduling Policy \E r 

At time slot t, the base station makes the scheduling and rate adaptation decisions based on 
the channel/estimator joint statistics and the channel estimate vector C = c (the time index is 
dropped for notational simplicity). 

(1) Rate Adaptation: 

For each user i, assign rate Ri such that, 



Note that when the channel state estimation is accurate, the conditional probability P(Cj > r (7j = Cj) 
will be a step function, with *S> essentially becoming the classic maximum- weight policy in [2]. The next 
proposition establishes the throughput optimality of policy \E'. Details are provided in Appendix B. 

Proposition 2. The scheduling policy ^ stably supports all arrival rates that lie in the interior of the 
stability region A. 

Proof Outline: The proof proceeds as follows. Consider a Lyapunov function L(Q[t\) = X^i=iQ?M- 
For any arrival rate A that lies strictly within the stability region A, we know it is stably supportable by 
some policy G$. Under Go, we show that the corresponding Lyapunov drift is negative. We then show 
that policy minimizes the Lyapunov drift and hence it will have a negative drift, thus establishing the 
throughput optimality of 

The results obtained thus far when the channel/estimator joint statistics is available at the scheduler 
are along expected lines. Nonetheless, they serve as a benchmark to the rest of the work under incomplete 
knowledge of the channel/estimator joint statistics, which is the main focus of the paper. 

4 Incomplete Knowledge of Channel/Estimator Joint Statistics 

In this section, we study scheduling with rate adaptation when the scheduler only has knowledge of the 
marginal statistics of the estimator, i.e., P(Ci = q), Vq E S, i G {1, . . . , N}, and hence, the knowledge 
of the channel/estimator joint statistics is incomplete at the scheduler. We first illustrate, with a simple 
example, that significant system level losses are incurred when no effort is made to learn these statistics, 
and hence no rate adaptation is performed. 

4.1 Illustration of the Gains from Rate Adaptation 

With incomplete information on the channel-estimator joint statistics, the scheduler naively trusts the 
channel estimates to be actual channel states and transmits at the rate allowed in this state. Under this 
scheduling structure, for the single-hop network we consider, the stability region is given in Appendix C 



Ri = argmax{P(Cj > r|Cj = q) • r} 

r&S 



(2) Scheduling Decision: 



Schedule the user / that maximizes the queue- weighted rate as follows: 



/ = 



arg max 



{Q i -P(C i >R i \C i = c i )-R i } 



by 




(3) 



c€S N 



4 




Figure 1: Illustration of the system level gains associated with joint scheduling and rate control. P{{Ci = 
k\Ci = k) denotes the probability that the channel estimate of user i is k given the actual channel of 
user i is k. (a) P^Q = k\d = k) = 0.8, k G {0.2,1}, i G {1,2}; (b) P i {C l = k\d = k) = 0.4, 
k G {0.2,1}, i G {1,2}. 



For a two-user single- hop network, this region is plotted in Fig. [T] along-side the network stability region 
when full knowledge of the channel/estimator joint statistics is available at the scheduler and hence rate 
adaptation is performed. The channel between the base station and each user is independent and binary 
(S = {0.2, 1}) with P(Ci = 1) = 0.8, for i = 1, 2. For different mismatch between the channel and the 
estimate, Fig. Q] plots the stability region of the system when rate adaptation is performed and when 
it is not. Note the significant reduction in the stability region when rate adaptation is not performed. 
This loss increases with increase in the degree of channel-estimator mismatch. The preceding example 
underscores the importance of rate adaptation and hence the need to learn the channel/estimator joint 
statistics. We now proceed to introduce our joint statistics learning-scheduling policy. 

4.2 Joint Statistics Learning - Scheduling Policy 

We design the policy with the following main components: (1) The fraction of time slots the policy 
spends in learning the channel/estimator joint statistics is fixed at 7 G (0,1), (2) The worst-case rate 
of convergence of the statistics learning process is maximized. We formally introduce the policy next, 
followed by a discussion on the policy design. 

Joint statistics learning-scheduling policy 
(parameterized by 7) 

(1) In each slot, the scheduler first decides whether to explore the channel of one of the users or 
transmit data to one of the users. Specifically, it randomly decides to explore the channel of user i 
with probability x\JN where X^t=i X \ /N < 1. The quantity xi G (0, 1] is a function of 7 and the 
channel estimate, Cj, of user i. It is optimized to maximize the worst-case rate of convergence of 
the statistics learning mechanism subject to the 7 constraint. We postpone the discussion on this 
optimization to Proposition 21 Note that, we have dropped the time index from the estimates for 
ease of notation. 

(2) If a user is chosen for exploration, this time slot becomes an observing slot. Call the chosen user 
as e. The scheduler now sends data at a rate r that is chosen uniformly at random from the set S. 
Let the quantity £(t) indicate whether the transmission was successful or not: 

£(t) = l(ce >r), 
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Probe the channel of 
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selected rate r 
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of P(C.>r|C. = c.) 





Rate Adaptation Decision 

R, [t] = arg max,. P(C, > r \ C i = c,. ) • r 



Link Activation Decision 
/[?] = argmax, Q[t} Rtf^C, > R^q = c.) 



Figure 2: Illustration of the joint channel learning - scheduling policy. 



where, recall, c e denotes the current channel state of user e. Let @ic,r denote the set of exploration 
time slots when the channel estimate of user i was c and user i was explored with rate r. Thus, 
the current slot is added to the set Q e ,£ e ,r- Now, an estimate of the quantity P(C e > r C e 
obtained using the following update: 



c P ) is 



Pt{C e > r 



Or 



fcee e 



where |V| denotes the cardinality of set V. We assume Pt(C e \C e ) to be uniform when O 
i.e., P t {C e > r C e = c e ) = l-r/|5|. 



e,c e ,r 



(3) With probability 1 — Y2i=i n0 user ^ s chosen for exploration and the slot is used for data 
transmission. The scheduler follows policy introduced in the previous section with P(Ci > 
Ci = Ci) replaced by the estimate Pt(Ci > r d 



An illustration of the proposed policy is provided in Fig. [2 We now discuss the design of the quantities 

xl, Ci G 5, i G {1, . . . , A^}. Let i]i^ = P{Ci = Ci)~jf- be a measure of how often the channel of user i 
is explored when the estimate is Cj. For fairness considerations, we impose the following constraint in 
addition to the 7-constraint discussed earlier: 
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j/N. 



The preceding constraint ensures that each user's channel is explored for an equal fraction, j/N, of 
the total time slots. From strong law of large numbers, with probability one, Pt(Ci > r Ci = Cj) will 

converge to P(Q > r Ci = c-i) as t tends to infinity. The rate of convergence of the channel/estimate 



1. 



joint statistics, parameterized by the user and the channel estimate, is given by the following lemma. 
Henceforth, we drop the suffix i from c\ for notational convenience. 

Lemma 3. 



lim sup 

t— >oo 



almost surely (a.s.), where 



Pt{Ci > r\6i ~- 


= c)-P(Ci>r\di = c) 


- = V2a 


1/ 


loglog(^) 


JPiCi >r\Ci 


= c)(l-P(C i >r\C i 


= c)). 



Proof. We use N r [t] to denote the number of exploration slot corresponding to estimated channel C, = c 
and rate r. We express the left hand side of the equation in the lemma as follows. 

Pt{Ci >rCi = c)- P(d >rCi = c) 



lim sup ■ 

t^oo v /(21oglog(r/ ii et/l«5|))/(r/i,cVI«5|) 

Pt(Ci >rCi = c)- P(d >rC l = c) 
■ lim sup ■ 



t— ¥OC 



y/(2\0g\0gN r [t])/N r [t] 

loglogiV r [t] Vi,&t/\S\ 



Vloglog(7/ ii£ t/|5|) N r [t] 
From Law of Iterated Logarithm ([19|). we get 



(4) 



lim sup 

t— >oo 



Pt{C i >rC i = c)-P(Ci> 



a = c) 



^/(21oglogJV r [t])/iV r [t] 



(5) 



almost surely. We also have 



loglog^ r [t] 
log logfo 



log log N r [t] - log\og(r] i)£ t/\S\ 

log log(7y i) et/|5|) 
lo g( l + los^M/fa^yi)] ) 



k>glog(77i, e i/|<S|) 

Because {A^ r [i]} is a renewal process ([20]) with inter-renewal time (?7i,c/|«S|) -1 j we will have 

lim N r [t]/(rjict/\S\) = 1 almost surely. 

t—>oo ' 

Hence (6) tends to 1 almost surely. Substituting equation (5) and (6) into (4) we get 

Ci = c) 



(6) 



Pt{C i >rC i = c)-P{C i >r 

lim sup ; 

*^oo ^/(21oglog(r/ ii gt/|5|))/(r/ i ^/l«5|) 



(T. 



almost surely. 



□ 



Note from the preceding lemma that, for each {i,c}, the higher the quantity rjic, the faster the 
convergence of Pt(C{ > r|Cj = c). Also note that, for each user i, the channel estimate c with the slowest 
convergence affects the overall convergence performance for that user i. Taking note of this, we proceed 
to design x l ~ that maximizes the lowest convergence rate - the bottleneck. 
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Figure 3: Illustration of the design of optimal x\ when N = 2, 7 = 0.2 and S = {1, . . . , 6}. 



The optimization problem (U) for user i is given by 

max min rj iS = ^P(Ci = c)x\ 
xi c TV 

c 

< x\ < 1, for all c G 5 

For ease of exposition, we assume, without loss of generality, that P(Ci = s\) < P(Ci = S2) < • • • < 
P(C; = S| S |). Let [ 

■^siJ-^s^' ' ' ' '^stsJ ^ e ^ e °pti ma l solution to the above problem. We now record the 
structural properties of the optimal solution. 

Proposition 4. The solution x l * k , \/k G {1, . . . , |<S|}, to the optimization problem (U) can be obtained 
with the following algorithm: 



(1) Initialization: Let k = 1; T = 0, oj = 0; 



(2) IfP(C = s k ) > , then, 



\S\-w 

-Y-T,. i eT P (Ci = s j ) 



V/ > k, x\* 



(\S\ -u).P(C = si) 
Algorithm terminates. 



(3) Otherwise x l * k = 1, F = T U Sk, oj = lo + 1, k = k + 1. IfT = S, algorithm terminates, 
otherwise repeat Step (2). 



Proof Outline: The proof proceeds by establishing two crucial properties of the optimal solution. First, 
define Qi as the set of all channel estimates Sk such that the optimal 2** = 1. Thus fti = Uk{sk ■ x l * k = 1}. 
If no such estimate exists, f2j = 0. The optimal solution has the following properties: 

(1) If ^ = then P(Ci = s k )x™ = 7/|S|,Vfc. 

(2) If Qi / 0, then x\\ = 1. 
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Recall that the channel states are ordered such that P(Ci = s\) < P(Ci = S2) < • • • < P(Ci = 
The first property essentially says that if there does not exist a channel estimate s, for which x l * = 1, 

then the optimal solution is such that the learning rate ( ^ — — ) is uniform ( 1 ) for all s^, k G 

{1, . . . , |<S|}. Because, otherwise, there is always room to improve the bottleneck convergence rate by 
redesigning the quantities x**. The second property says that whenever there exists an estimate Sk^i 
for which x** = 1, the estimate si acts as a bottleneck, and the optimal value of x** must be 1. The 
proposed algorithm now checks whether a solution yielding uniform convergence rate is feasible. If so, the 
solution is trivially given by x l * = „,^ l _ — rjgr, for all k G {1, ... , Otherwise, using the preceding 

properties, the algorithm assigns x** = 1 and goes on to solve the reduced optimization problem over 
x** . . . , x** 5| , iteratively. Details of the proof can be found in Appendix D. 

The proposed algorithm is illustrated in Fig. [3] when 7 = 0.2, N = 2 and S = {1, . . . ,6}. Focusing 
on User 1, Fig. [3ja) plots the probability of the estimated channels and the optimal values of Xg, s G S. 
Note that, the lower the value of P{C\ = s), the higher the assigned x^*, since the algorithm maximizes 

the bottleneck convergence rate p ( c ~?) Xs . This is further illustrated in Fig. [3jb) where the optimized 
convergence rate is shown to be 'near uniform', underlining the minmax nature of the optimization. Note 
that the structure of the minmax algorithm bears some similarity with the water-filling algorithm used 
in power allocation across parallel channels (|21j). There the algorithm tries to 'equalize' the sum of two 
components (signal and noise powers) across channels, while the minmax algorithm we propose tries to 
'equalize' the product of two components (P(Ci = s) and x**). 

We now perform a stability region analysis of the proposed policy. Define the stability region of a 
policy as the exhaustive set of arrival rates such that the network queues are rendered stable under the 
policy. The stability region of the proposed policy, parameterized by 7 G (0, 1), is recorded below. 

Proposition 5. The stability region of the proposed policy is given by 

A>={\ s.t. e A} 4 (1 - 7 )A. 

' 1 — 7 

where A is the stability region of the network when complete channel/estimator joint statistics is available 
at the scheduler. 

Proof Outline: The proof proceeds by showing that, under the proposed joint statistics learning - 
scheduling policy, the instantaneous maximal sum of the queue weighted achievable rates, with sufficient 
time, can be arbitrarily close to the case when perfect knowledge of the statistics is available. Details are 
provided in Appendix E. 

4.3 Throughput - Delay Tradeoff 

As 7 0, the proposed policy has a stability region that can be arbitrarily close to the system stability 
region A. The trade-off involved here is the speed of convergence and hence queueing delays before 
convergence. Since an analytical study of this trade-off appears complicated, we proceed to perform a 
numerical study. The simulation setup is described next. 

We use i.i.d. Rayleigh fading channels with minimum mean square error (MMSE) channel estimator 
as seen in [22] and [23] • The channel model is given by 

Y = ^phX + v, 

where X, Y correspond to transmitted and received signals, p is the average SNR at the receiver, and v is 
the additive noise. Both h and v are zero- mean complex Gaussian random variables, i.e., with probability 
density CM (0,1). Let h denote the estimate of the channel and h denote the estimation error. Under 
the channel statistics assumed, h is zero-mean complex Gaussian with variance /3, where the value of 
(3 depends on the resources allocated for estimation ([24J). Given the value of h, the channel rate is 
R = log(l + p|/i| 2 ). We quantize the transmission rate to make the channel state space to be discrete and 
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finite. We assume a two-user network and fix (3 = 0.1 and p = 50 for both users' channels. We study the 
average behavior of the proposed policy by implementing it over 10000 parallel queuing systems. 

We first study the time evolution of the probability of transmission success for different values of 7. 
Fig. Hla) shows that, for any 7, the probability of successful transmission increases as the accuracy of the 
estimate of the channel/estimator joint statistics improves with time. Also, as expected, the larger the 
value of 7 is, the faster is the improvement in the probability of successful transmission. Note that higher 
transmission success probability essentially means lesser number of retransmissions. This is illustrated in 
Fig. Mb). 

In Fig. El we study the time evolution of the average packet delay - the delay between the time a 
packet enters the queue and the time it leaves the head of the queue - for various values of 7. Note that 
7 influences the average delay through (1) the average number of retransmissions and (2) the fraction of 
time slots available for transmissions. It is expected that the nature of the influence of 7 on the average 
delay depends on whether the estimate of the channel/estimator joint statistics has reached convergence 
or not. After convergence, the average delay is influenced by 7 solely through the fraction of time slots 
available for transmissions. Thus, after convergence, the higher the value of 7, the higher the average 
delay. This is illustrated in Fig. [5j Before convergence, however, the effect of 7 on the average delay is 
not straightforward. Fig. [SJ along with the fact that higher 7 results in faster convergence, suggests the 
following: before convergence, 7 influences the average delay predominantly through the average number 
of retransmissions, resulting in decreasing average delay for increasing 7. In fact, Fig. [5] suggests the 
existence of a larger phenomenon: the trade-off between throughput (the stability region) and the delay 
before convergence. 

5 Conclusion 

We studied scheduling with rate adaptation in single-hop queueing networks, under imperfect channel 
state information. Under complete knowledge of the channel/estimator joint statistics at the scheduler, 
we characterized the network stability region and proposed a maximum-weight type scheduling policy 
that is throughput optimal. Under incomplete knowledge of the channel/estimator joint statistics, we 
designed a joint statistics learning - scheduling policy that maximizes the worst case rate of convergence 
of the statistics learning mechanism. We showed that the proposed policy can be tuned to achieve a 
stability region arbitrarily close to the network stability region with a corresponding trade-off in the 
average packet delay before convergence and the time for convergence. 
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Figure 5: Illustration of the average packet delay over time for various values of 7. 
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Appendices 



A Proof of Proposition 1 

Proof. (Sufficiency) Define the Lyapunov function L(Q[t\) = J2iLiQi[t\- Recall the queue dynamic 
equation given by Equation (1), the Lyapunov drift can be written as 

N 



AL[t} = E[Y,Qht + l]-Qht}\ Q[t] 
1=1 

N 

<B + 2^Q i [t](\ l - E\l(I[t] = i) ■ R[t] ■ l(R[t] <d[t]) Q[t] 



where 



i=i 



v 



(7) 



B = Y J E[l(I[t] = i) ■ R 2 [t]l(R[t] < CM) +A 2 M Q[t}} 



i=i 
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Noting that B is bounded. Let r*(ci) = argmax ri P(Cj > Cj = q ) ■ Tj. Consider any arrival vector 
A strictly within the interior of A. For each channel state c, there exist scaling vector a c and 5 > 
associated with it such that 

A, + 5 < vr 6 • af • P(Q > r*(c,)|Q = c, ) • r*(c;), (8) 

for any user i, where Yli=i a i = 1 f° r 

Therefore, we can design a scheduling policy that does the following: At the channel estimation 
C[t] = c[t], channel i is activated with probability af. The rate allocated to channel % will be r*(di[t]). 



Then the service rate of user i will be: 
tH = E[l(I[t) = i) ■ Ri[t] ■ l(Ri[t] < Q 

Substitute (9) to (8) we have 



k& ■ «f ■ PiCi > rt{ci)\di = Ci ) • r*{ck) (9) 

ceS N 



Aj - m < -5. 



(10) 



Noting that in this policy the rate adaptation and link activation is completely determined by the 
channel estimation, and does not rely on queue length information, and therefore in this case 



E 



E 



i(i[t] = i) ■ R[t] ■ i(R[t\ < a[ 



l(I[t]=i)-R[t]-l(R[t]<Ci[t]) Q[t] 

Substitute (10) (11) into (7), the Lyapunov drift function now becomes 

N 



AL[t] <B- 26^2 Qi[t]. 



i=l 



Because the scheduling and rate adaptation decision only depends on the current queue length and 
current channel estimate state, the queue evolves as a Markov Chain. According to Foster-Lyapunov 
Stability criterion [18], the queues will be stable. 



(Necessity) From strict separation theorem [T7], for any arrival rate vector out side the proposed 
region A, there exist f3, 5 > 0, such that for any vector v inside the stability region A 



A' 



^ft(A;-^)><5 



1=1 



Define Lyapunov function L(Q[t]) = J2iLi PiQAA- For any stationary scheduling policy that makes 
I[t] and R[t] decisions, we will have the following Lyapunov drift expression 



A' 



E 



L(Q[t + 1]) - L(Q[t]) Q[t]\ = Y J PiE[A i [t] - l(I[t] = i) ■ l(Ri[t] < d[t]) ■ Ri[t] Q[t] 

i=i 

N 

= ^A[Ai - E[l(I[t] = i) ■ l(Ri[t] < d[t]) ■ Ri[t]\Q[t]]] (12) 

i=i 

Let m = E[l(I[t] = i) ■ l(Ri[t] < Qit]) ■ Ri[t] Q[t]} 
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Next we are going to show that p, £ A. Consider 



E 



l(I[t] = i) ■ l(Ri[t] < C t [t}) ■ Ri[t] Q[t}] 
E[E[l(I[t] = i) ■ liR^t} < Ci[t}) ■ R^Qlt^Clt}] 

7r at ■ E\l(I[t] = i) • l(R, t [t] < d[t]) ■ R, t [t] Q[t}; C[t] = c t 



t£S N 



*c ■ l(I[t] = i) ■ Ri[t] ■ Pr(Ri[t] < d[t] 6t[t] = c) 



c(LS N 

<Y.^- = *') • r *&M) • Pr ( r *^{t}) < Ci[t]\di[t] = m)- 

The third equality holds because R[t] and I[t] is determined by the current channel estimation and queue 
length information within the class G of stationary policies, and also the i.i.d. channel assumption. The 
above expression indicates that /2 € A. Hence from (12) 



E 



L(Q[t + 1]) - L(Q[t])\Q[t]\ = ~ E[l(I[t] = i) ■ l(Ri[t] < Cft]) ■ Ri[t]\Q[t}] 



i=l 

> 5 



The Lyapunov function will always have a positive drift and therefore, the queue is unstable. 



□ 



B Proof of Proposition 2 

Proof. Assume the arrival rate vector A is strictly within the interior of stability region, there exists 
e > such that A + el G int(A). Because A is strictly within the stability region, similar to the proof of 
Proposition 1, there exists some randomized scheduling policy Go that stably supports the arrival rate 
vector A + el, and that Go will only depends on the estimated channel state. 

Suppose the proposed scheduling policy ^ will result in rate allocation Ri[t] and scheduling decision 
I[t] at time t. Consider the policy Go that act at the same time t with the same channel state estimate 
and queue lengths knowledge, we denote its rate allocation to be Ri[t] and link activation decision to be 
I[t], therefore we have 

N 

Y,Qi[t) -E[l(I[t] = i) ■ R[t] ■ l(R[t] < d[t\) 
i=i 

N 

= Y,QM -E[E[l(I[t] = i) ■ R[t) ■ l(R[t] < d[t]) Q[t], C[t] 
i=i 

N 

<^Qi[t] ■ E\E\l(I[t] = i) ■ R[t] ■ l(R[t] < Ci[t]) Q[t], C[t] 



i=l 



The last inequality holds because policy maximizes the left hand side of the above inequality at every 
time slot. Also because the queue of each user is stable under policy Go, we have 



\ + e<E 



l(I[t] = i) ■ Ri[t] ■ l(Ri[t] < d[t\) 



And therefore 

N N 

^Qi[t] ■ (Xi + e) < -E\l(T[t] = i) ■ Ri[t] ■ l(Ri[t] < C t [t}) 



i=l 
N 



< J2Qi[t] ■ E E l(I[t] = i) ■ R[t] ■ l(R[t] < d[t]) Q[t], C[t] 
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Substitute it back to Lyapunov drift expression (7), then we will have: 



N 



AL[t] <B + 2^2Q i [t](\ i -E\E\l(I[i\ = i) ■ R[t] ■ l(R[t] < d[t]) Q[t], C[t] 
i=i 

N 

<B-2eY,Qi[t] 



i=i 



Because scheduling policy ^ only depends on current queue length and channel estimate, and because 
the channel process is a i.i.d. across time, the queue evolution under policy \P will be a Markov Chain. 
From Foster-Lyapunov criterion, the statement is proven. □ 



C Proof of Stability Region Without Rate Adaptation 

Proof. The proof of the statement is somehow similar to proof of proposition 1. 

(Sufficiency) Define the Lyapunov function L(Q[t]) = YliLi Qf [*L the Lyapunov drift can be written 
as Equation (7). For any arrival vector A strictly within the interior of A and each channel state c, the 
vector a c and 5 > satisfies 

Xi + S < ^2 Tic- af ■ P(Ci > Ci\Ci = di) ■ di, 

for any user i, where X^i=i a t = 1 f° r 

The rest of the proof follows similar as in Proposition 1. 

(Necessity) Similar to the proof of Proposition 1, for any arrival rate vector out side A, there exist (3, 
5 > 0, such that for any vector i> inside the stability region A, we have X)i=i Pii^i — Vi) > 5. 

Define Lyapunov function L(Q[t]) = Yli=i PiQi[t], for any stationary policy that makes scheduling 
decision I[t) and rate adaptation decision R[t], again we will have the similar Lyapunov drift expression 
as in Equation (12), 



N 



E 



L(Q[t + l])-L(Q[t}) Q[t} \ =J2&- h-E[l{I[t}=i)-l{C i [t]<C i [t])-C i [t\ Q[t\] 



i=i 



Let m = E[l(I[t] = i) ■ l(di[t] < Ci[t\) ■ Ci[t] Q[t]] . We can show fl G A from 

E[l(I[t] = i) ■ l(di[t] < d[t]) ■ Ci\Q[t]]\ =E[E[l(I[t] = i) ■ l(Ci[t] < d[t}) ■ Ci[t]\Q[t];C[t]] 

= ^ir 6 - l(I[t] = i) ■ Ci ■ Pr(ci < Ci[t]\di[t] = "a) 

c£S N 

= ]T vr 6 • l(I[t] =i)- Ci - Pr(ci < Ci[t]\di[t] = Ci). 
ces N 



The rest follows similarly as in proof of Proposition 1. 



□ 
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D Proof of Proposition 4 

For notational convenience, we drop user index i in the proof. The optimization problem (U) can be 
re-written as 

max min Pr(C = c)xc 

s.t. ^p r (d = c)xc = l 

Y^Pr{C = c) = l 

ceS 

< x t < 1. 

This problem can be transformed into a Linear Programming problem as the following. 

max t 
s.t. Pr(C = c)x £ > t 

Y^PriC = c)x, = 1 

Y Pr(C = c) = 1 
ceS 

< x & < 1. 

Hence the problem has become a convex optimization problem. We let [t*, x* ± , x* 2 , ■ ■ ■ ,x* ] be the 
optimal solution to the above problem and let O = {s^ : = 1}. 

Lemma 6. The optimal solution to the optimization problem (U) must satisfy the following structural 
properties: 

(i) 7/0 = then Pr{C = s k )x* Sk = 7 /|«S| for all k. 

(ii) Conversely, if Pr(C = Sk)x* k = r^r for all k, then = 0, except for when Pr(C = s±) = 

Proof, (i). Suppose = but we don't have Pr(C = Sk)x* k = ^ for all k. Let n be such that 
t* = Pr(C = s n )x* n . Because Pr(C = values are not equal for every k, there exists m ^ n such 

that t* = Pr(d = s n )x* Sn < Pr(d = s m )x* Sm . 

Because = 0, x* n ^ 1 and x* m ^ 1. Let n = {x* fe : t* = Pr(C = Sk)x* Sk \ and let |LT| = tt. If we set 

xt = xt h - , vien 

Sl Sl nPriC = 8l ) 

x- =x* 5 
Xsm Pr(C = s m ) 

where 5 > is small that will guarantee that x~ m stays positive. 

We can check that in this case, still ^ £ i^ m P r (C=Ci)x* k + J2c en P r (C=ci)xf l +Pr(C = Cm)^7 m =7- 

But in this case the new value of the objective function t* new > t*, contradicts to the assumption that 
t* is the optimal value. Therefore we must have Pr{C = Sk)x* k = t^t, Vi, establishing the proof of (i). 

(ii). When Pr(C = Sk)x* Sk = rgj, V/c, and x* k < 1 for all k, we will have = 0. 
If 3h,Pr(C = Sh) = t/|«5|, hence x* h = 1. By assumption 

Pr{C = s x )x* Sl = Pr{C = s h )x* Sh = Pr{C = s h ). 

Because Pr(C = s±) < Pr(C = Sh) and x* Sl < 1, we must have Pr(C = s\) = Pr(C = Sh) and 
x* = 1, establishing the proof of (ii). □ 
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Lemma 7. // Q / 0, then x* = 1 and t* = Pr(C = s\). 

Proof. We proof this Lemma by contradiction. Suppose that x* Sl < 1. 

(Case 1). If i* = Pr(C = si)x* , without loss of generality, suppose s\ is the only channel state 
that results in Pr(C = si)x* Sl = t*. Because 0^0, suppose x*. = 1 for some state sj. If we set 

x ™j W = 1 — 5/Pr(C = Sj) and x r ^ w = x* Sl + 5/Pr(C = s±), where 5 is small enough, we will get an 
new value of the objective function strictly larger than t* while still satisfy the constraints in (U), which 
contradicts to the optimality of [t* , x* Sl , x* S2 , ■ ■ ■ ,x* s ]. 

(Case 2). If t* < Pr(C = suppose t* = Pr(C = s m )x* m for some s m and assume, with 

no loss of generality, it is the only state of this kind. Because Pr(C = s m )x* Sm < Pr(C = s±)x* and 
Pr{C = s m ) > Pr(C = si), we have x* Sm < x* Sl < 1. We can set x n s e ™ = x* Sm + 5/Pr(C = s m ) and 
x^ w = Xg — 5/Pr(C = s±) for 5 small. Again, this change of variables will result in an new objective 
function value strictly larger than t*, contradicts to the optimality of t*. 

Therefore, we have x* Q =1. 

Suppose we have t* < Pr(C = sx)x*. Similar to case 2, suppose t* = Pr(C = s m )x* m for some s m , 
and assume such s m is unique, we will have x* m < 1 because Pr(C = s m ) > Pr(C = s\). By letting 
x new _ \-\-§/p r (C = s m ) and x™ w = l — 5/Pr{C = s\) with 5 small, we can get a strictly larger objective 
function value, contradicting to the optimality of t* . Therefore we must have t* = Pr{C = s\)x* si . □ 

After we have established the above lemmas, we proceed to the proof of Proposition 4. 

(Proof of Proposition 4) 

Proof. 

(Case 1). First consider the case when Pr(C = si) > t^t. If in this case $7^0, then from Lemma 7, 

t* = Pr(C = si) > r£j. Then Pr(C = s k )x* Sk > \S\ ■ t* > \S\ ■ rjr = 7. Therefore contradict to the 

constraint Yl k =i P r {C = Sk)x* k =7. So Q. = 0, from Lemma 6, we have Pr(C = s k )x* Sk = for all i 
and 

x* = 7 

Xsk \S\ ■ Pr(d = 9k y 

(Case 2). Consider when ^ = Pr{C = s\). If = 0, then from Lemma 6 (i), x* x = 1, contradict to 
= 0. So Vt 7^ and from Lemma 7, t* = Pr(C = si) = r^r- Because 

\s\ \s\ 

i=l i=l 

we must have t* = Pr(C = Sk)x* Sk for all k in order to satisfy the constraint of J2 k =i Pr(C = s k )x* k = 7. 
Therefore we still have 

x* = 7 

Xsk \S\ ■ Pr(C = s k ) ' 

It is easy to check here that < x* k < 1 in the case 1 and case 2. We hence have justified the step 
(2) in Proposition 4 when k = 1. 

(Case 3). If we have Pr(C = s\) < tX, then we can not set Pr(C = s k )x* Sk = jX for all k because 

x* Sl < 1. So from Lemma 6, Q 7^ 0. From Lemma 7, x* x = 1 and t* = Pr(C = si). 

Since now we have identified the optimal value t* of the objective function and x^, we still need to 
identify the rest of the solution of x* . for j / 1. Admitting there might be multiple solutions for those 
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x*., we consider the following relaxed optimization problem (U + ) 

max t 

s.t. Pr(C7 = s k )x Sk >t; k^l 
Y J Pr{C = s k )x Sk = 1 + 

x Sl = 1; < x Sk < 1, k ^ 1. 

where 7 + = 7 — Pr(C = s\). 

It can be readily verified Lemma 7 and Lemma 8 holds for the above optimization problem with 7 and 
5 substituted by 7 + and <S\si, respectively. Let x*. , j = 1, • • • , N be the optimal solution for the above 
optimization problem (U + ). We proceed to show x*. is also optimal solution to optimization problem 
(U), i.e., satisfying all the constraints of (U) and will preserve the optimality of t* of (U) identified earlier. 

Let t* be the optimal objective function to the problem (U + ). To show that optimal solution to (U + ) 
(i.e., x*. for j ^ 1) preserve the optimality of t* , we must check that t* > t*. This is indeed the case and 

is explained as follows. Let f2 be the set of estimated channel states Sh such that x* h = 1 in (U + ). When 
0/0, from Lemma 7, i* = Pr(C = s 2 )x* S2 = Pr(C = s 2 ) ■ 1 > Pr(C = si) • 1 > Pr(C = si) • x* a > t*. 
When = 0, from Lemma 6 we have Vj 7^ 1, 



f . = Pri e = = Tfff = *> > |5|Pf(c= (| 'j, ) : 1 7 (c = - ) - iv ( e = Sl ) = r 

where the inequality is from |X > Pr{C = s\) assumed in the beginning of Case 3. So the optimality of 

^ \s\ ^ 

t* is preserved. It is also clear that the constraint Pr(C = si)x* Sl + Yjj=2 P r {C = s j)%tj = 7 is satisfied. 

We hence face a reduced optimization problem (U + ), for which the optimal solution will also be 
optimal for the original optimization problem (U). Problem (U + ) takes the same form of (U) with 7 and 
and S substituted by 7 + and S\s±, respectively. The proposed algorithm solves the reduced optimization 
problem by conditioning on the reduced settings of (Case l)-(Case 3). Hence similar proof as in (Case 
l)-(Case 3) is also applicable for the iterative algorithm. By doing the proof iteratively, the optimality 
of the algorithm is proved. □ 



E Proof of Proposition 5 

We need the following lemmas before proving Proposition 4: 

Lemma 8. For any user i and any observed channel state c k , there exist a constant T , such that for any 
time t>T, and any channel states r\ and r 2 , if P(Ci > r\\Ci = Ck) r i > P{Ci > r 2\Ci = Ck) r 2, then 

Pt(Ci > ri\Ci = c fc )n > Pt(Ci > r 2 \Ci = c k )r 2 almost surely. 

Proof. Let A = P(Ci > r\\C{ = c k ) r i ~ P(Ci > r 2\Ci = c k )r 2 . Then there exist 5 > and e > such 
that 5r\ + er 2 < A. From Strong Law of Large Numbers, there exist time T such that for t > T, 

Pt{Ci > n\6i = Cfc) - P{Ci > n\Ci = c k ) > as-, 
Pt(Ci > r 2 \C t = c k ) - P(Ci > r 2 \6i = c k ) < e a.s. 

Hence we have 

Pt(Ci > n\Ci = c k )n - P t (d > r 2 \C % = c k )r 2 
>P{Ci > n\Ci = c k )n - Srt - P(Ci > r 2 \d = c k )r 2 - er 2 
=P(Ci > = c k )n - P(d > r 2 |a = Ck)r 2 - (5n + er 2 ) 

>A-A = a.s. 

□ 



IS 



Remark: This lemma implies that, there will be a time beyond which the allocated rate with empirical 
knowledge of Pt{Ci > t\ (7, = c k ) is the same as with accurate knowledge. Because both the number of 
users and the state space is finite, we will chose the right rate with probability one as time is large, which 
is summarized in the following corollary. 

Corollary 9. There exist a time T beyond which, with probability 1, the empirical scheme will allocate 
rate R[t] the same as R[t] when the P(C[t]\C[t]) is perfect known. 



Proof. This result is immediate from the previous lemma. 



□ 



Lemma 10. At those non-observation slot t, let I[t] and Ri[t] be the scheduling decision by the joint 
statistics learning-scheduling policy. Let I[t] and Ri[t] be the scheduling decision of ^ with accurate 
knowledge. for any p > 0. There exist time N beyond which, with probability 1, 



N 



N 



| Qi W 1 =j) p (Cj [t]>Ri [t] | dj it] =dj)R t [t] -J^Qi M =i ) p ( c i [t] >Ri [t] I Q [t] =ci)Ri [t] | 

3=1 i=l 
N 



i=l 

Proof. Prom corollary 9, let T\ be the time beyond which R[t) = R[t] a.s.. We consider the time t > T\ 
and thus Ri[t] = Ri[t] = rJ. = aigmax. P(Cj[t]>r\Cj[t]=Cj) ■ r almost surely for all i. From strong law of 

large numbers, let Ti > T\ be such that beyond which \Pt{Cj>r^\Cj=Cj)r^, — P{Cj>r*~.\Cj=Cj)r*~^ \ < p 
almost surely for all j. We henceforth consider t > T<i. Given queue length Q[t] and estimated channel 
state information Ci[t], I[t) and I[t] are determined by 

I[t] =argmax Qi[t]P(Ci > r|.|Cj = Cj)r|. a.s. 

i 

I[t] =argmax Qi[t]P t (Ci > r|.|Cj = Cj)r?. a.s. 

i 

If I[t] = I[t], the statement will hold. If I[t] = h but I[t] = k for h ^ k, we have 

QfcMACG^r^la^c^r^^MPt^^la^rJ,, (13) 
Q fc [t]P(C' fc >r| fc |c h =c fc )r^>Q fc [t]P(C' fc >r2 fc |a fc =c fc )r2 fc . (14) 

Because t > T%, we further have 

N N 

Y Q l [t] l(I[t]=j)P(Cj >Ri [t] I dj^Ri [t] - J^Qi [t] l(T[t]=i)P(Ci>Ri [t] I Ci=Ci)Ri [t] 

j=l i=l 

=Qh[t]P(C h >r* h \C h =c h )r* Ch - Q k [t]P{C k >r* Ck \C k =c k )rl k 
<Qh[t][Pt(C h >r* h \C h =c h )r* Ch + p] - Q k [t][Pt(C k >r* h \C k =c k )r* h - p] 
<p(Q h [t] + Q k [t}) 

almost surely, where the first inequity is from the assumption that \Pt(Cj>r^ Cj=Cj)r^,—P(Cj>rg, Cj=Cj)i 
p almost surely for all j, and the last inequality is from (13). Also, we have 

N N 



J2Qi[tn(I[t]=i)P(Ci>Ri[t] Ci=Ci)Ri[t] -Y,Q l [ t W[ t \=])P{C j >Ri[t\ 



i=l 



C^c^Rilt] 



=Q k [t]P(C k >r* k C k =c k y ck -Q h [t]P(C h >r* Ch \C h =c h )r* Ch 

<o 

<p(Q h [t] + Q k [t]) 

almost surely, where the first inequality is from (14). We hence proved the Lemma. 



□ 
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(Proof of Proposition 5) 



Proof. We first prove that for any A strictly within A 7 , i.e., A + el G ini(A 7 ), there exists a policy Go 
that makes decision only based on empirical statistics, i.e., Pt(Ci > r\C{) and stably supports A. Beccause 



Ai + e < (1 - 7) ^ • °% • P{ Pi ^ r i(?i)\Ci = ^)- r * (*)• 



(15) 



Then the randomized policy Go can be: at every non-observing estimated state c, activates channel 
I[t] = i with probability af with the allocated transmission rate Ri[t] = argmax r Pt(C[t] > r\C[t]) ■ r. 
Define Lyapunov function L(Q[t]) = Q 2 [t], similar to (7), the Lyapunov drift can be written as 



N 

M[t] = E[Y,Q 2 i[t + l]-Ql[t)\ Q[t] 



i=l 



N 



i=l 



< Br + 2^Q i [i](A i - (1 - 7)E l(J[t] = i) ■ R[t] ■ l(R[t] < Q[t]) Q[t] 



(16) 



where -Bi is bounded, 



N 



Bi = (l- 7) 2 J>[l(I[t] = i) ■ R 2 [t]l(R[t] < Ci[t])+J$[t] Q[t}), 



■i=i 



Then from Corollary 9, for t > N, R[t] = r*(6j) = argmax r P(Gj > r\Ci = Ci ) • r almost surely. 
Hence 



(1 - i)E[l(I[t] = i) ■ R[t] ■ l{R[t) < d[t])\ Q[t] 

=(1 - 7 ) *c • af • P(d > r*(c t )\C t = c t ) • r*(c,) 
ces N 

Substitute (15) and (17) into equation (16), the Lyapunov drift function will take the form 

N 



(17) 



AL[t] < B — 2e^2Qi[t]. 



i=l 

Hence Queues will be stable. 

We next show that the joint statistics learning-scheduling policy will stabilize A similar to the proof 
of Proposition 2. Given queue length information Q[t] and estimated channel state C[t], suppose the 
proposed joint statistics learning-scheduling policy will result in rate adaptation R[t] and scheduling 
decision I[t] at time t, and suppose the policy with perfect CSI will make rate adaptation decision R[t] 
and scheduling decision I[t]. Associated with the joint statistics learning and scheduling algorithm, the 
Lyapunov Drift can be written as 

N 



LL[t) = E[Y J Ql[t + n-Q 2 i {t]\ Q[t] 

1=1 

N 

<B 2 + 2^Qi[t](A i -(l- 7 )s[l(^t]=0-%]-l(flM<C' i [t])| Q[t] 

i=l 
N 

= B 2 + 2Y,Qi[t]Ui-(l-l)E\ E\l(T[t}=i)-R[t]-l(R[t}<C i [t}) Q[t], C[t] 



(18) 



i=l 

where B2 is bounded, 



N 



B 2 = {1- 7 ) 2 5>[1(/M = i) ■ R 2 [t]l(R[t] < a[t])+A 2 [t] Q[t]]. 



8=1 
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Same as in proof of Proposition 2, we have 

TV 



< 



Y^QAA ■ E[l(I[t] = i) ■ R[t] ■ l(R[t] < d[t]) 
1=1 

TV 

Y,Qilt] ■ E\E\l(I[t] = i) ■ R[t] ■ l(R[t] < d[t]) Q[t], C[t] 



i=i 



And therefore for any 0< p < e/(l — 7), for i > max{Ti, N}, we have 

TV TV 



i=l 



^2Qi[t]-(*i + e) < (l-7)J3Q i [i]-^[l(I[t]=i)--Ri[t]-l(BiM<C i [i]) 

i=l 
TV 

<(l- 7 )J3Q i [*]- J B[^[l(J[t]=i)- J R[t]-l(fl[*]<Cl[<]) Q[t], C[t] 

i=l 
TV 

= (1 - 7) ^Qi[t] • £[#[l(I[t] = i) • R[t] ■ l(R[t] < d[t])\ Q[t], C[t] 

i=l 
TV 

= (i-7)Z>M E i(^]=0-%]-^(%]<CiW|QM) 

i=l C7i[t]e5 
TV 

Ci[t]65 i=l 
TV 

<(!-t) E E^[4 1(/[t] = ' ) "^ ] " jP ™- GM l^ [t])+p 

Ci[t]eS i=i 

TV 

= (i- 7 )E<9i[*]-^[^[w] = o-%]-i(%]<a[i])| QM, c[t\ 
i=i 



+p 



(19) 



where the last inequality comes from Lemma 10. Substitute (19) into the Lyapunov drift expression (18), 
we will have: 



TV 



AL[t] < B 2 - 2(e - p(l - 7)) E QM 
Since e — p(l — 7) > 0, the queues will be stable. 



8=1 



□ 
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